Topic recognition for news speech based on keyword spotting

نویسندگان

  • Yoichi Yamashita
  • Toshikatsu Tsunekawa
  • Riichiro Mizoguchi
چکیده

This paper describes topic identi cation for Japanese TV news speech based on the keyword spotting technique. Three thousands of nouns are selected as keywords which contribute to topic identi cation, based on criterion of mutual information and a length of the word. This set of the keywords identi ed the correct topic for 76.3% of articles from newspaper text data. Further, we performed keyword spotting for TV news speech and identi ed the topics of the spoken message by calculating possibilities of the topics in terms of an acoustic score of the spotted word and a topic probability of the word. In order to neutralize e ect of false alarms, bias of the topics in the keyword set is removed. Topic identi cation rate is 66.5% assuming that identi cation is correct if the correct topic is included in the top three topics. The removal of the bias improved the identi cation rate by 6.1%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

Confidence Measure for Utterance Verification in Keyword Spotting System

In this article, we propose an utterance verification technique for keyword spotting. The keyword spotting system analyzes a given spoken content and searches every speech segment in which one of pre-defined keywords is uttered. To maintain a stable recognition performance in the system, we propose an utterance verification technique that verifies whether a found utterance, or a candidate keywo...

متن کامل

Spanish Keyword Spotting System Based on Filler Models, Pseudo N-gram Language Model and a Confidence Measure

In order to organize efficiently lots of hours of audio contents such as meetings, radio news, search for spoken keywords is essential. An approach uses filler models to account for non-keyword intervals. Another approach uses a large vocabulary continuous speech recognition system (LVCSR) which retrieves a word string and then search for the keywords in this string. This approach yields high p...

متن کامل

Comparison of keyword spotting methods for searching in speech

This paper presents and discusses keyword spotting methods for searching in speech. In contrast with searching in text, the searching in speech or generally in multimedia data still represents a challenge. The aim of the paper is to present a keyword spotting (KWS) method based on a large vocabulary continuous speech recognition (LVCSR) system, based on phonetics decoder, and keyword spotting u...

متن کامل

A new keyword spotting algorithm with pre-calculated optimal thresholds

Keyword spotting is a very forward-looking and promising branch of speech recognition. This paper presents a HMM-based keyword spotting system, which works with a new algorithm. The first discussion topic is the description of the search algorithm, that needs no representation of the non-keyword parts of the speech signal. For this purpose, the computation of the HMM scores and the Viterbi algo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998